68 research outputs found

    PubDNA Finder in a Nutshell. Searching the Life Sciences Literature with Sequences of Nucleic Acids

    Get PDF
    Biomedical researchers and clinicians working with molecular technologies in routine clinical practice often need to review the available literature to gather information regarding specific sequences of nucleic acids. This includes, for instance, finding articles related to a concrete DNA sequence, or identifying empirically-validated primer/probe sequences to evaluate the presence of different micro-organisms. Unfortunately, these hard and time-consuming tasks often need to be manually performed by researchers themselves since no publicly available biomedical literature search engine, e.g. PubMed, PubMed Central (PMC), etc., provides the required search functionalities. In this article, we describe PubDNA Finder, a web service that enables users to perform advanced searches on PubMed Central-indexed full text articles with sequences of nucleic acid

    Using Hierarchical Task Network Planning Techniques to Create Custom Web Search Services over Multiple Biomedical Databases

    Get PDF
    We present a novel method to create complex search services over public online biomedical databases using hierarchical task network planning techniques. In the proposed approach, user queries are regarded as planning tasks (goals), while basic query services provided by the databases correspond to planning operators (POs). Each individual source is then mapped to a set of POs that can be used to process primitive (simple) queries. Advanced search services can be created by defining decomposition methods (DMs). The latter can be regarded as “recipes” that describe how to decompose non-primitive (complex) queries into sets of simpler sub queries following a divide-and conquer strategy. Query processing proceeds by recursively decomposing non primitive queries into smaller queries; until primitive queries are reached that can be processed using planning operators. Custom web search services can be created from the generated planners to provide biomedical researchers with valuable tools to process frequent complex queries

    Detectors could spot plagiarism in research proposals

    Get PDF
    Having all been involved in proposal evaluation, we believe the studies indicate that a text matching analysis of research proposals could reduce plagiarism in subsequent publications. For instance, when European Commission evaluators have met in the past to evaluate research proposals, they received printed copies which had to be returned before the panel members left, and had no computer access during deliberations. A plagiarism detector using text-mining methods could be used instead of the current security measures. Such a system could, in principle, detect similarities to previous submissions and uncited sources using advanced document segmentation. Only official agencies have access to confidential proposals and the funds to experiment with automated plagiarism-detectors. It is important that they should investigate these approaches to reducing the possibility of scientific misconduct

    A Knowledge Engineering Approach to Recognizing and Extracting Sequences of Nucleic Acids from Scientific Literature

    Full text link
    In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences

    A Method for Indexing Biomedical Resources over the Internet

    Get PDF
    A large number of biomedical resources are publicly available over the Internet. This number grows every day. Biomedical researchers face the problem of locating, identifying and selecting the most appropriate resources according to their interests. Some resource indexes can be found in the Internet, but they only provide information and links related to resources created by the owner institution of each website. In this paper we propose a novel method for extracting information from the literature and create a Resourceome, i.e. an index of biomedical resources (databases, tools and services) in a semi-automatic way. In this approach we consider only the information provided by the abstracts of relevant papers in the area. Building a comprehensive resource index is the first step towards the development of new methodologies for the automatic or semi-automatic construction of complex biomedical workflows which allow combining several resources to obtain higher-level functionalities

    Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory

    Get PDF
    [Abstract] Background: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases. Objective: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databases automatically extracted from the scientific literature. BiDI provides an index of data resources and a path to access them seamlessly. Methods: We designed an ensemble of deep learning methods to extract database mentions. To train the system, we annotated a set of 1242 articles that included mentions of database publications. Such a data set was used along with transfer learning techniques to train an ensemble of deep learning natural language processing models targeted at database publication detection. Results: The system obtained an F1 score of 0.929 on database detection, showing high precision and recall values. When applying this model to the PubMed and PubMed Central databases, we identified over 10,000 unique databases. The ensemble model also extracted the weblinks to the reported databases and discarded irrelevant links. For the extraction of weblinks, the model achieved a cross-validated F1 score of 0.908. We show two use cases: one related to “omics” and the other related to the COVID-19 pandemic. Conclusions: BiDI enables access to biomedical resources over the internet and facilitates data-driven research and other scientific initiatives. The repository is openly available online and will be regularly updated with an automatic text processing pipeline. The approach can be reused to create repositories of different types (ie, biomedical and others).Proyecto colaborativo de integración de datos genómicos; PI17/0156

    SNOMED2HL7: a tool to normalize and bind SNOMED CT concepts to the HL7 Reference Information Model

    Get PDF
    [Abstract] BACKGROUND: Current clinical research and practice requires interoperability among systems in a complex and highly dynamic domain. There has been a significant effort in recent years to develop integrative common data models and domain terminologies. Such efforts have not completely solved the challenges associated with clinical data that are distributed among different and heterogeneous institutions with different systems to encode the information. Currently, when providing homogeneous interfaces to exploit clinical data, certain transformations still involve manual and time-consuming processes that could be automated. OBJECTIVES: There is a lack of tools to support data experts adopting clinical standards. This absence is especially significant when links between data model and vocabulary are required. The objective of this work is to present SNOMED2HL7, a novel tool to automatically link biomedical concepts from widely used terminologies, and the corresponding clinical context, to the HL7 Reference Information Model (RIM). METHODS: Based on the recommendations of the International Health Terminology Standards Development Organisation (IHTSDO), the SNOMED Normal Form has been implemented within SNOMED2HL7 to decompose and provide a method to reduce the number of options to store the same information. The binding of clinical terminologies to HL7 RIM components is the core of SNOMED2HL7, where terminology concepts have been annotated with the corresponding options within the interoperability standard. A web-based tool has been developed to automatically provide information from the normalization mechanisms and the terminology binding. RESULTS: SNOMED2HL7 binding coverage includes the majority of the concepts used to annotate legacy systems. It follows HL7 recommendations to solve binding overlaps and provides the binding of the normalized version of the concepts. The first version of the tool, available at http://kandel.dia.fi.upm.es:8078, has been validated in EU funded projects to integrate real world data for clinical research with an 88.47% of accuracy. CONCLUSIONS: This paper presents the first initiative to automatically retrieve concept-centered information required to transform legacy data into widely adopted interoperability standards. Although additional functionality will extend capabilities to automate data transformations, SNOMED2HL7 already provides the functionality required for the clinical interoperability community.Instituto de Salud Carlos III; PI13/0202

    On a meaningful integration of web services in data-intensive biomedical environments: The DICODE approach

    Get PDF
    This paper reports on an innovative approach that aims to reduce information management costs in data-intensive and cognitively-complex biomedical environments. Recognizing the importance of prominent high-performance computing paradigms and large data processing technologies as well as collaboration support systems to remedy data-intensive issues, it adopts a hybrid approach by building on the synergy of these technologies. The proposed approach provides innovative Web-based workbenches that integrate and orchestrate a set of interoperable services that reduce the data-intensiveness and complexity overload at critical decision points to a manageable level, thus permitting stakeholders to be more productive and concentrate on creative activities

    An Automatic Method for Retrieving and Indexing Catalogues of Biomedical Courses

    Get PDF
    Although there is wide information about Biomedical Informatics education and courses in different Websites, information is usually not exhaustive and difficult to update. We propose a new methodology based on information retrieval techniques for extracting, indexing and retrieving automatically information about educational offers. A web application has been developed to make available such information in an inventory of courses and educational offers

    CDAPubMed: a browser extension to retrieve EHR-based biomedical literature

    Get PDF
    Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. Results: We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. Conclusions: CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems
    • …
    corecore